Mac100% 1998 November

home *** CD-ROM | disk | FTP | other *** search

/ Mac100% 1998 November / MAC100-1998-11.ISO.7z / MAC100-1998-11.ISO / ｵﾝﾗｲﾝｿﾌﾄ定点観測 / ﾕｰﾃｨﾘﾃｨ / Mops 3.2.sea / Mops 3.2 / Mops source / PPC source / cg1 < prev next >

Wrap

Text File | 1998-06-22 | 21KB | 732 lines

marker m__cg1 ¥ ========================================= ¥ POWERPC CODE GENERATOR ¥ ========================================= true constant assertions? ¥ make it false when everything really works PPC? [IF] false constant debug? [ELSE] false constant debug? [THEN] false constant 64bit? ¥ =============== OPTIMIZATION FLAGS =============== (* it's useful to be able to turn these off, to find those recalcitrant bugs - and in the case of range checking, to be able to turn it off for speed-critical tested code. Or for floating multiply-add, someone might want to turn it off to restore strict IEEE FP semantics. NOTE: the default for all these boolean values is TRUE. *) true value hoist? ¥ can we move ops out of loops if ¥ they're invariant in the loop? true value hoist_fetches? ¥ can we move fetches to as early ¥ as possible? true value allow_match? ¥ true if we can eliminate an op if ¥ we find the result already in ¥ a register. Basically the same ¥ as CSE (common subexpression ¥ elimination). true value move_by_recompiling? ¥ can we try to recompile an earlier ¥ op to avoid a reg move? true value optimize_leaf_calls? ¥ do we use the fast calling sequence ¥ for leaf definitions? true value optimize_branches? ¥ do we simplify branches over other ¥ branches, branches to the next ¥ instruction, etc.? true value cascade? ¥ do we combine ops into single ¥ instructions where we can? true value range_check? ¥ do we range-check array accesses? true value multiply-add? ¥ do we cascade a floating multiply ¥ followed by an add into a ¥ floating multiply-add instruction ¥ (or one of its variants)? ¥ =============== FORWARD DEFNS ================ forward .G forward .F forward .C forward .GS forward .CS forward .AL forward .FAL forward .FR forward ZS ¥ =============== OTHER GLOBALS ============== 0 value backstop_CDP 0 value fetch_backstop 0 value basic_block_start 0 value loop_start 0 value max_called_#PL 0 value max_called_#FPL false value equalizing? ¥ we don't actually use this one currently false value eq_block_recompiling_move? bytestring eq_ranges bytestring const_data bytestring sv_const_data (* Notes on PPC code generation: The general way to generate optimized code for an architecture with n registers is to analyse each basic block and generate a directed acyclic graph (DAG) whose nodes represent each value generated and whose edges represent uses of those values. Note it isn't a tree, since more than one edge can go into a node. Then optimization can be done, and common subexpressions (i.e. nodes) combined. Then, using graph coloring or whatever, the nodes are assigned to registers. This assumes there are less regs than nodes, so that we have to find all dependencies, and re-use a reg when its old value isn't needed any more. Here we can simplify things a little, and manage everything in one pass. On the PPC we have so many regs that for normal shortish basic blocks, we will generally be able to simply assign a different reg for each node. On the few occasions when we don't have a free reg, we can select one which has a value with no outstanding references - there's a slight chance we could have used this value in a subsequent op, but this is pretty well negligible. We can optimize on the fly: each time we get a new value, we can search the reg set for a match so we can use a combined node. This code is invoked as follows: Whenever Handlers is called to generate 68K code, if PPC? is true, the Handlers selector and opcode is pushed and PPCvec is executed. We will set PPCvec to point to PPC_compile which will use the selector to dispatch to the right routine. Notes on memory architecture: On the PPC, code and data is normally kept separate, with the code being read-only. The PEF format defines separate code and data sections, and at launch time these are placed in pointer-based blocks in memory. We ought to conform to this convention for installed apps, since it will give the best performance. In the development environment, our dictionary should be in a read-write block for obvious reasons. We'll define a separate data area, which will become the data section of an installed app. These two areas can be in handles, which will allow us to resize them on the fly if necessary. At installation time, we'll generate a PEF in which the dictionary is added to the existing code section, and the data area is added to the data section. Our initial nucleus will probably be generated exactly this way, either from the 68k or PPC version. Thus our initial nucleus is simply an application, which transforms itself into the Mops development environment by creating a handle for the new dictionary and data areas. Now, we could allow the new dictionary area to link itself to the old one by some cleverness, but since the dic would only be getting split at one place I don't think it would be worth the complexity. Better to just BlockMove the old dic area to the start of the new when the dev environment is being set up, and likewise for the data area. Colon uses a new header format incorporating two flag bytes in addition to what we use on the 68k, and also has to observe 4-byte alignment for the code. For this reason we generate a full PPC-style header (including 4-byte alignment) for colon definitions. When we compile a call to a colon defn we have to do PPC-style alignment even on the 68k. For other types of definition we don't have to bother, and the code should work on both platforms (since on the PPC FIND will look after the extra alignment). The 2 flag bytes are organized as 4 nybbles: nybble 0: bit 0: 1 if it's a leaf defn bit 1: 1 if it alters the count register bit 2: spare bit 3: 1 if it alters any FP reg nybble 1: number of results in registers on exit nybble 2: number of named parameters nybble 3: number of named parms + locals (doing it this way is a bit more convenient, and the max number of parms+locals is only 11, so we have enough bits) If the definition does any floating operations or anything that alters an FP reg, we need more flags, so we add an extra 32 bits. For alignment, we have to take at least 32, so we might as well make the most of it. byte 0: byte 1: bytes 2 and 3 are 4 nybbles: nybble 0: nybble 1: number of floating results in registers on exit nybble 2: number of named floating parameters nybble 3: number of named floating parms + locals ============= REGISTER DEFINITIONS: ================ Notes on reg assignment: We need 3 regs for scratch in boilerplate code sequences. We'll use r0, which is a bit unusual anyway, and r11 and r12 (see below). r1 is the stack pointer, r2 is RTOC. We can't monkey with these. r11 and r12 are used in the calling sequence for external calls. Apple says they can be used as scratch at all other times, so we'll use them in boilerplate sequences. PowerMacForth does this too, and in the assembler they're aliased as rX and rY. For external calls, r3-r10 are used for parameters, and r3 for a returned result. They won't be saved over the calls. Of course for internal Mops calls we can do what we like. We can use these regs for general operands, and on an external call normalize the stack so that the required number of cells are stored in r3 on. At that stage we won't have any other cached stack cells, so we don't need the regs preserved anyway. This scenario gives us 8 regs for general operands, i.e. cached stack cells (r3-r10), which should be enough. If it turns out not to be enough we could grab a couple of the regs we've allocated for locals (see below). r13-31 are "non-volatile" - they're saved over external calls. For internal Mops calls we just need to save the locals, since other regs like the base address regs don't get changed. Now for the special regs we need. These all need to be saved over external calls, and so are in the non-volatile block. For addressing the dictionary, a difference to the 68k version is that we need to keep code and data separate (see above). In the development environment these will be in handle-based blocks, and in an installed app they'll be defined in the PEF which will make them end up in pointer- based blocks. Anyway, as long as we handle the addressability questions properly, it shouldn't matter where they are. Code references will be for branches, constants and other constant data like literal strings and class info. Constants can always be handled via literal instructions. Even if it needs more than 16 bits it can be generated in a reg with 2 instructions which will be faster than a memory reference. Branches will always be self-relative, and they have enough displacement bits to get us anywhere. For other constant data, however, it's extremely handy to have a base reg available, even if these references aren't performance-critical. Thus, if we still have modules (which is still up for grabs), we'll need 4 base regs - 2 to address code and 2 for data. Note that on entry, r2 (RTOC) always points to the start of the data area as defined in the PEF. But we can't use RTOC as a regular base reg, since in the dev environment our data area will be off in a separate handle. Other regs we need are RP (return stack pointer), the loop variable I, and the base address of the current object. Now we may as well use one of our "local" registers for I, since it will be very rare for us to need all of them. This will mean one less local in definitions that use I, but that's not a problem. It seems we should go for a separate FP stack, since the Scientific Library is now using this. This will also give better code on the PPC, since when we bring stack cells back from mem to regs, we'll always know which regs to move them to. The floating stack pointer probably should be in a register. So in all we'll need 7 special regs out of the non-volatile block. This leaves r19-r31 for locals, which means that if we limit the number of locals to 13, we can keep them all in registers. This looks reasonable. **** Side note: we won't use the lmw and stmw instructions, since Keith D has warned me that they'll go away in future! This means that we can use their values as pseudo-opcodes, since we know that Mops will never generate them with their proper meaning. The values in question are primary opcodes 46 and 47, which means $B8xxxxxx to $BFxxxxxx. In particular, we'll use $BE00 as a handler field for PPC colon definitions. (Handler fields will be 4-byte aligned, and so will appear in the same place as instruction opcodes.) This will make life easier for the disassembler. Some notes on register handling across internal Mops calls: Saving and restoring local regs can be a bit long-winded, so to save space we should normally do what we do on the 68k - that is, at the start of each defn, save whatever regs we need for locals, and restore them at the end. For EXIT, instead of doing everything inline as we did on the 68k, we'll do a branch to the semicolon. This is almost as fast (esp. as it's an unconditional branch), and saves space. We'll probably make an exception for leaf procs, since these get executed so frequently. What I'm currently planning, in the case where the leaf proc has named parms/locals, is to do the houskeeping in the calling routine instead of in the called leaf proc. This will give me the possibility of generating the parms straight in the required regs. Also I might be able to do the saving and restoring of the needed regs at the beginning and end of the calling routine (depending on what parms/locals that routine might need. This would get these housekeeping operations out of any inner loops. This alternative calling convention should certainly be faster, but will take a lot more space, so I won't do it all the time. ======================================== *) 0 constant CR0 4096 constant sys_SP_framesize 8 constant FPcell (* We don't move the stack pointers every time we push and pop something - rather, we keep track of the accumulated offset here and only adjust when we have to. *) 0 value STK_OFFSET 0 value FSTK_OFFSET ¥ ================= OD FIELD VALUES ================== ¥ Here we define various values for OD fields. We generally use the same ¥ names as for the 68k code generator, although the values aren't all ¥ the same (which doesn't matter anyway). ¥ Mode values: enum { mdGPR mdBD mdAbs mdLit mdPC mdFPn } PPC? [IF] hexx [ELSE] hex [THEN] ¥ Flag byte bits (opFlags field): 0 constant flExt ¥ Sign extend 1 constant fbExt 1 constant flFP ¥ Floating operation 2 constant fbFP 2 constant flLit ¥ Floating Literal 4 constant fbLit 3 constant flFCR ¥ FPU constant ROM reference 8 constant fbFCR ¥ 10 constant fbNoRecompile? ¥ Operation type byte: ¥ 0 means empty, 1 to 7 mean not empty but unknown for some reason. ¥ So far we've only defined 1 and 2. 1 constant otUnknown ¥ isn't empty, but we don't know anything about ¥ the contents 2 constant otUnkStored ¥ ditto, but we've stored it and so might be ¥ able to optimize a fetch from the same ¥ location, even tho we don't know its type 3 constant otUnkPulled ¥ ditto, pulled from memory part of stack 7 constant otUnknownCodes ¥ all codes less than or equal to this are ¥ some kind of empty/unknown 8 constant otMove ¥ reg move ¥ the following are also on the 68k, and in target compilation they ¥ dispatch to us from the 68k interpreter using these codes, so we ¥ can't change them. 12 constant otMUL 13 constant otDIV 14 constant otUDIV ¥ the following are PPC only: 10 constant otMULH ¥ multiply high 11 constant otUMULH ¥ multiply high 16 constant otAddc ¥ 17 constant otAddic 17 constant otAdde 18 constant otAddze 19 constant otAddme 1A constant otSubfc ¥ 1C constant otSubfic 1B constant otSubfe 1C constant otSubfze 1D constant otSubfme ¥ these following are also 68k codes, and can't change 21 constant otADD 22 constant otSUB 23 constant otAND 24 constant otOR 25 constant otXOR 26 constant otCMP 27 constant otUCMP ¥ unsigned compare - PPC only 28 constant otNEG 29 constant otNOT 2A constant otShift 2B constant otShift&mask ¥ PPC only 2C constant otTrap ¥ ditto 30 constant otPMend ¥ End of integer ops ¥ 3F constant otFPcmp ¥ Floating-point comparison. A special case. 40 constant otFPstart ¥ Start of regular floating-point ops. Note ¥ these are NOT in the same order as the ¥ integer ops. 40 constant otFMOVE 41 constant otFADD 42 constant otFMUL 43 constant otFMADD ¥ multiply-add - PPC only 48 constant otFPnoncom ¥ The following FP ops are non-commutative 48 constant otFSUB 49 constant otFDIV 4F constant otFPcmp ¥ FP comparisons 50 constant otFPstore 52 constant otFPfetch 54 constant otFPmon ¥ The following FP ops are monadic (* we don't bother including these in the dictionary, but these are the values: 54 constant otFABS 55 constant otFNEG 56 constant otFSIN 57 constant otFCOS 58 constant otFTAN 59 constant otFATAN 5A constant otFSQRT *) 5F constant otFPend ¥ End of FP ops 60 constant otStore ¥ Store 80 constant otDeferredStore 62 constant otFetch ¥ Direct fetch otFetch constant otAt ¥ Indirect fetch - ends up being treated ¥ the same 62 constant otDUP ¥ Stack shuffling 63 constant ot2DUP 64 constant otDROP 65 constant ot2DROP 66 constant otSWAP 67 constant otOVER (* 68 constant otNIP 69 constant otTUCK 6A constant otROT 6B constant otDOWN 6C constant ot2SWAP *) ¥ For FP stack shuffling, we'll use the corresponding ¥ opcodes in the range 72 - 7B, but not bother defining ¥ constants and cluttering the dic too much. ¥ Subtype byte values: ¥ For comparisons, the top 4 bits of the byte give the condition register field ¥ bit which we need to branch on for this condition: LT = 0, GT = 1, EQ = 2. ¥ The low bit (bit 7) is 1 if the condition we want has a field bit value of 1. ¥ Bit 6 is 1 for unsigned, and bit 5 is 1 for comparisons with zero. 20 constant cmpNE 21 constant cmpEQ 00 constant cmpGE 01 constant cmpLT 10 constant cmpLE 11 constant cmpGT 02 constant cmpHS 03 constant cmpLO 12 constant cmpLS 13 constant cmpHI 24 constant cmpZNE 25 constant cmpZEQ 04 constant cmpZGE 05 constant cmpZLT 14 constant cmpZLE 15 constant cmpZGT ¥ here's a table to map our 68k comparison codes to the above PPC ones: PPC? [IF] createx comparison_codes 0 cx, ¥ 0 0 cx, ¥ 1 cmpHI cx, ¥ 2 cmpLS cx, ¥ 3 cmpHS cx, ¥ 4 cmpLO cx, ¥ 5 cmpNE cx, ¥ 6 cmpEQ cx, ¥ 7 0 cx, ¥ 8 0 cx, ¥ 9 0 cx, ¥ A 0 cx, ¥ B cmpGE cx, ¥ C cmpLT cx, ¥ D cmpGT cx, ¥ E cmpLE cx, ¥ F decimalx [ELSE] create comparison_codes 0 c, ¥ 0 0 c, ¥ 1 cmpHI c, ¥ 2 cmpLS c, ¥ 3 cmpHS c, ¥ 4 cmpLO c, ¥ 5 cmpNE c, ¥ 6 cmpEQ c, ¥ 7 0 c, ¥ 8 0 c, ¥ 9 0 c, ¥ A 0 c, ¥ B cmpGE c, ¥ C cmpLT c, ¥ D cmpGT c, ¥ E cmpLE c, ¥ F decimal [THEN] 0 value OPERATION 0 value SUBOPERATION ¥ Some types of instruction need special treatment - e.g. AND etc. use the ¥ rA field for the destination. So we define some types: enum { noType loadStoreType arithType logicalType cmpType branchType shiftType } ¥ ================= UTILITY WORDS ================== : MONADIC? ¥ ( opcode -- opcode b ) otNeg otNot within? IF true EXIT THEN otFPmon otFPend within? ; : MEM_REFERENCING? ¥ ( opcode -- b ) SELECT[ otFetch ], [ otStore ], [ otFPFetch ], [ otFPStore ]=> true DEFAULT=> drop false ]SELECT ; PPC? [IF] : dasm ; ¥ can't disassemble in native mode yet! : z ; [ELSE] forward dasm ¥ disassembles what we've done so far forward dcurr ¥ disassembles the current defn (even if not finished) forward Z ¥ ends ppc compilation and disassembles. [THEN] ¥ GetImmediateOp does the same when we're going to execute the operation ¥ now, returning the xt of the word to execute. : GETIMMEDIATEOP { opType subtype -- xt } opType SELECT[ otAdd ]=> ['] + [ otSub ]=> ['] - [ otAND ]=> ['] and [ otOR ]=> ['] or [ otXOR ]=> ['] xor [ otMUL ]=> ['] * [ otDIV ]=> ['] / [ otNEG ]=> ['] negate [ otNOT ]=> ['] not [ otCMP ]=> subtype CASE[ cmpNE ]=> ['] <> [ cmpEQ ]=> ['] = [ cmpGE ]=> ['] >= [ cmpLT ]=> ['] < [ cmpLE ]=> ['] <= [ cmpGT ]=> ['] > [ cmpZNE ]=> ['] 0<> [ cmpZEQ ]=> ['] 0= [ cmpZGE ]=> ['] 0>= [ cmpZLT ]=> ['] 0< [ cmpZLE ]=> ['] 0<= [ cmpZGT ]=> ['] 0> DEFAULT=> cr .h ." undef otCMP subtype in getImmediateOp" 1 die ]CASE [ otUCMP ]=> subtype CASE[ cmpHS ]=> ['] u>= [ cmpLO ]=> ['] u< [ cmpLS ]=> ['] u<= [ cmpHI ]=> ['] u> DEFAULT=> cr .h ." undef otUCMP subtype in getImmediateOp" 1 die ]CASE [ otShift ]=> subtype CASE[ 0 ]=> ['] << [ 1 ]=> ['] >> DEFAULT=> cr .h ." undef otShift subtype in getImmediateOp" 1 die ]CASE DEFAULT=> cr .h ." undef op passed to getImmediateOp" 1 die ]SELECT ; PPC? not [IF] ¥ we put this in pnuc3 in target mode ¥ 16bits? ( n signed? -- n b ) ¥ returns true if n will fit in 16 bits (signed or unsigned as requested). : 16BITS? ¥ ( n signed? -- n b ) IF -32768 32767 within? ELSE dup 16 >> 0= THEN ; [THEN] : SIGNED? ¥ ( operation - b ) ¥ Returns true if this is a signed op - assumed to be the ¥ default. We just return false for the specific unsigned ¥ ops, and true for everything else. SELECT[ otUDIV ], [ otAND ], [ otOR ], [ otXOR ], [ otUCMP ]=> false DEFAULT=> drop true ]SELECT ; : REVERSE_COMPARISON ¥ called if we swap operands of a compare. Adjust ¥ subOperation appropriately. subOperation $ 24 and ?EXIT ¥ out if monadic or EQ or NE $ 10 xor> subOperation ; ¥ This must come before we redefine @ABS below! PPC? not [IF] from pasmMod import{ :PPC_code ;PPC_code disasm disasm_word disasm_xt disasm_rng disasm_cnt disasm_one set_disasm_call_range } compile: pasmMod ¥ While still in 68k-land, we need a PPC-style reloc! and @abs. The proper ¥ PPC versions will be compiled in pnuc3 and setup respectively. Note, ¥ any changes must of course be made in both places. : RELOC! { theAddr dest -- } theAddr addr>S&D $ ffffff and swap 24 << or dest ! ; : reloc, DP reloc! 4 ++> DP ; : relocCode, CDP reloc! 4 ++> CDP ; : displCode, CDP displ! 4 ++> CDP ; ¥ In the target compilation, we only have 4 segments. : @ABS { addr ¥ relocAddr seg# displ -- absAddr } addr @ -> relocAddr relocAddr $ ffffff and -> displ relocAddr 24 >> -> seg# seg# SELECT[ 8 ]=> code_start [ 9 ]=> data_start [ 10 ]=> seg_code_start [ 11 ]=> seg_data_start DEFAULT=> ." not a reloc addr" 1 die ]SELECT displ + ; : @abs6 @abs ; ¥ a variant name we can call later when @abs ¥ has been redefined in the PPC image [THEN]